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Abstract 

An asymptotic theory is developed for computing volumes of regions in the parameter 
space of a directed Gaussian graphical model that are obtained by bounding partial cor- 
relations. We study these volumes using the method of real log canonical thresholds from 
algebraic geometry. Our analysis involves the computation of the singular loci of corre- 
lation hypersurfaces. Statistical applications include the strong-faithfulness assumption 
for the PC-algorithm, and the quantification of confounder bias in causal inference. A 
detailed analysis is presented for trees, bow-ties, tripartite graphs, and complete graphs. 

1 Introduction 

Extensive theory has been established in recent years for causal inference based on directed 
acyclic graph (DAG) models. A popular way for estimating a DAG model from observational 
data employs partial correlation testing to infer the conditional independence relations in the 
model. In this paper, we apply algebraic geometry and singularity theory to analyze partial 
correlations in the Gaussian case. The objects of our study are algebraic hypersurfaces in the 
parameter space of a given graph that encode conditional independence statements. 

We begin with definitions for graphical models in statistics. A DAG is a pair G = (V, E) 
consisting of a set V of nodes and a set E of directed edges with no directed cycle. We 
usually take V = {1,2, ... ,p} and we associate random variables X\, X<i, . . . , X p with the 
nodes. Directed edges are denoted by (i, j) or i — > j. The skeleton of a DAG G is the 
underlying undirected graph obtained by removing the arrowheads. A node i is an ancestor 
of j if there is a directed path i j, and a configuration i — > k, j — > k is a collider at k. 

Finally, we assume that the vertices are topologically ordered, that is £ E implies i < j. 

Every DAG G specifies a Gaussian graphical model as follows. The adjacency matrix Aq 
is the strictly upper triangular matrix whose entry in row i and column j is a parameter if 
£ E and it is zero if E. The Gaussian graphical model is defined by the structural 
equation model X = A G X + e, where X = {X\, . . . , X P ) T . We assume that e ~ A/"(0, 1), where 
/ is the p x p-identity matrix. Then the concentration matrix of this model equals 

K = (A G -I)(A G -I) T . 

Key words and phrases: causal inference, PC-algorithm, (strong) faithfulness, real log canonical threshold, 
resolution of singularities, partial correlation, real radical ideal, asymptotics of integrals, almost-principal 
minor, directed acyclic graph, Gaussian graphical model, algebraic statistics, singular learning theory. 
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Since det(if) = 1, the covariance matrix £ = K~ l is equal to the adjoint of K. The entries 
of K and X are polynomials in the parameters a%j . Our parameter space for this DAG model 
will always be a full-dimensional subset Q of M) E \. 

For any subset S C V and distinct elements i,j in V\S, we represent the conditional 
independence statement iALj \ S by an almost-principal minor of either K or E. By this we 
mean a square submatrix whose sets of row and column indices differ in exactly one element. 
To be precise, i AL j \ S holds for the multivariate normal distribution with concentration 
matrix K if and only if the submatrix Ki R j R is singular, where R = V\(S U {i,j}) and 
iR = {i}UR. The determinant det(Ki R j R ) is a polynomial in {o-ij)uj)^E- We are interested 
in the hypersurface in M} E \ defined by the vanishing of this polynomial. Indeed, the partial 
correlation is equal to the algebraic expression 

corr(M|S) = - det ^«) _, (i) 
^det{K iRjiR ) ■ det(K jR:jR ) 

Since the principal minors under the square root sign are strictly positive, corr(i, j|S) = if 
and only if det(K{ R j R ) = 0. If this holds for all a € M} E \ then iALj \ S for G and we say that 
i is d-separated from j given S. This translates into a combinatorial condition on the graph 
G as follows [16, §2.3.4]. An undirected path P from i to j d-connects i and j given S if 

(a) every non-collider on P is not in S, 

(b) every collider on P is in S or an ancestor of a node in S. 

If G has no path that d-connects i and j given S, then i and j are d-separated given S, and 
det(Ki R j R ) = as a function of a. The weight of a path P is the product of all edge weights 
a rs along this path. It was shown in [17, Equation (11)] that the numerator det(Ki R j R ) in 
(1) is a linear combination, as in (5), of the weights of all paths that d-connect i to j given S. 
Our primary objects of study are the following subsets of the parameter space: 

Tub eiij | 5 (A) = {oj g n :\coTT(i t j\S)\ < A}. (2) 

Here corr(i, j\S) is a function of the parameter uj (denoted iflij) u,j)^E) above) in the space 
is a parameter in [0, 1], and S) is a triple where i and j are d-connected given 
S. These "tubes" can be seen as hypersurfaces which have been fattened up by a factor which 
depends on A and the position on the hypersurface (see Figure 3). The volume of Tube, j|s(A) 
with respect to a given measure (p(co) du on C M) E \ is represented by the integral 



V w1 s(A) = / <p(u)dw. (3) 

./ Tubers (A) 

In this paper we study the asymptotics of this integral when the parameter A is close to 0. 

Two applications in statistics are our motivation. The first concerns the strong-faithfulness 
assumption for algorithms that learn Markov equivalence classes of DAG models by inferring 
conditional independence relations. The PC-algorithm [16] is a prominent instance. Our 
set-up is exactly as in [17]. The Gaussian distribution with concentration matrix K is A- 
strong- faithful to a DAG G if, for any S C V and i,j ^ S, we have | corr(i, j\S) \ < A if and 
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only if i is d-separated from j given S. We write Vg(A) for the volume of the region in £1 
representing distributions that are not A-strong- faithful. Then Vg(A) is an aggregate of the 
volumes in (3) for all non-d-separated triples (i, j, S). Zhang and Spirtes [19] proved uniform 
consistency of the PC-algorithm under the strong-faithfulness assumption with A x 
provided the number of nodes p is fixed and sample size n — > oo. In a high-dimensional, sparse 
setting, Kalisch and Buhlmann [11] require strong- faithfulness with A x deg(G) log(p) / n, 
where deg(G) denotes the maximal degree (i.e., sum of indegree and outdegree) of nodes in G. 

For the scenarios described above it is essential to determine the asymptotic behavior of 
the unfaithfulness volume Vg(A) when A tends to 0. We shall address this issue using real log 
canonical thresholds [1, 14, 18]. Our Section 3 establishes the existence of positive constants 
£, m, C (which depend on G and ip) such that, asymptotically for A — > 0, 

V G (A) « C- AM- In A)— 1 , m 
Vy| S (A) « C'-\ e ' -(-InXr'- 1 . U 

(See (9) for an exact definition of ~.) This refines the results in [17] on the growth of Vg(X) 
via the geometry of the correlation hypersurfaces {det(KiRjji) = 0}. While [17] focused on 
developing bounds on Vg(A) for the low-dimensional as well as the high-dimensional case and 
showed the importance of the number and degrees of these hypersurfaces, we here analyze the 
exact asymptotic behavior of Vg(A) for A — > and G fixed and demonstrate the importance of 
the singularities of these hypersurfaces. Singularities get fattened up much more than smooth 
parts of the hypersurface, and this increases the volumes (4) substantially. 

Our second application concerns stratification bias in causal inference (see e.g. [7, 8]). Here, 
the volume V^^A) being large is not a problematic feature, but is in fact desired. Suppose 
we want to study the effect of an exposure E on a disease outcome D. If there is an additional 
variable C such that D — > C E, then stratifying (i.e. conditioning) on C tends to change 
the effect of E on D. This can lead to biases in effect estimation. This is known as collider- 
bias. On the other hand, if D <— C — > E holds, then C is a confounder and stratifying on C 
corresponds to bias removal. In certain larger graphs, such as Greenland's bow-tie example 
[7], stratifying on C removes confounder-bias but at the same time introduces collider-bias. In 
order to decide whether one should stratify on such a variable C, it is important to understand 
the partial correlations involved. In this application, the volume Vij\g(X) can be viewed as the 
cumulative distribution function of the partial correlation corr(i, j\S), and one is interested 
in comparing the two cumulative distribution functions V EtD \ c (\) and Ve,d(A). 

In this paper we examine V^j\s{X) from a geometric perspective, and we demonstrate 
how this volume can be calculated using tools from singular learning theory. To derive the 
asymptotics (4), the main player is the correlation hypersurface, which is the locus in 0, where 
corr(i, j\S) vanishes. The first question is whether this hypersurface is smooth, and, if not, 
one needs to analyze the nature of its singularities. We study these questions for various 
classes of interesting causal models, using methods from computational algebraic geometry. 

The remainder of this paper is organized as follows: In Section 2 we introduce the families 
of DAGs which we will be working with throughout. Example 2.1 illustrates the algebraic 
computations that are involved in our analysis. We also discuss some simulation results, which 
indicate the importance of singularities when studying the volume Vg(A) of strong-unfaithful 
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distributions. Section 3 presents the connection to singular learning theory [14, 18] and ex- 
plains how this theory can be used to compute the volumes of the tubes Tubers (A). Example 
3.1 illustrates our theoretical results for some very simple polynomials in two variables. 

In Section 4 we develop algebraic algorithms for analyzing the singularities of the cor- 
relation hypersurfaces. We show that, for the polynomials det(Km of interest, the real 
singular locus is often much simpler than the complex singular locus. For instance, Theorem 
4.1 states that these hypersurfaces are always smooth for complete DAGs with up to six nodes. 
In Section 5 we study the singularities and the volumes (3) for trees without colliders. 

Section 6 focuses on our second application, namely bias reduction in causal inference. 
Problems 6.2 and 6.7 offer precise versions of conjectures by Greenland [7], in terms of com- 
paring different volumes ^^(A) for fixed G. We establish some instances of these conjectures. 

In Section 7 we introduce more advanced methods, based on the resolution of singularities 
[9, 10], for finding the exponents I and m in (4). Finally, in Section 8 we present some new 
results on computing the constants C and C in our asymptotics (4) for tube volumes. 

2 Four classes of graphs 

In this article we will be primarily working with four classes of DAGs: 

i) Complete graphs: We denote the complete DAG on p nodes by K p . The corresponding 
matrix Ak p is strictly upper triangular and all (^) parameters aij are present. 

ii) Trees: We call a DAG G a tree graph if the skeleton of G is a rooted tree and all edges 
point away from the root (i.e. G has no colliders). We are particularly interested in the 
most extreme trees, namely star and chain graphs. We denote the star graph shown in 
Figure 1(b) by Star p and the chain graph shown in Figure 1(b) by Chain p . 

iii) Complete tripartite graphs: Let A,B <zV with A n B = 0. Then we denote by A =>• B 
the complete bipartite graph where (a, b) £ E for all a £ A and b £ B. A complete 
tripartite graph is denoted by Tripart pp / with 1 < p' < p — 3. It corresponds to the 
DAG {1, 2} =>• {3, . . . ,p — p'} =>• {p — p' + 1, . . . ,p} and is shown in Figure 1(c). 



12 12 




(a) Starp (b) Chairip (c) Tripart p p , (d) Bow p 



Figure 1: Various classes of graphs. 
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iv) Bow-ties: We define a bow-tie to be a complete tripartite graph Tripart p2 with two 
additional edges, namely (l,p — 1) and (2,p). A bow-tie is denoted by Bow p and is 
shown in Figure 1(d). Bow-ties with p = 5 feature prominently in Greenland's study [7]. 

The following example serves as a preview to the topics covered in this paper. 

Example 2.1. We illustrate our objects of study for the tripartite graph G = Tripart 62 - 
This DAG model has eight free parameters, namely the unknowns in the matrix 
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One conditional independence statement of interest is 1 _LL 2 | {5, 6}. Its correlation hypersur- 
face in M 8 is defined by the almost-principal minor in £ with rows 156 and columns 256, or 
the almost-principal minor in K with rows 134 and columns 234. That determinant equals 

(l+a| 6 )ai3a23a| 5 + (l+ai 5 )ai3a23a|6 + (l+al 5 )aua24al 6 + (l+a| 6 )ai4a24al 5 

+013024035^45 + 013024036046 + «14a23«35«45 + Ol4«23036046 (5) 
— 2013023035036045046 — 2014024035036 045046- 



This is a weighted sum of all paths which d-connect nodes 1 and 2 given {5,6}. The first 
term in the formula (5) for / = det(i^i34 j2 34) corresponds to the path 1—^3—^5^—3^—2 
in G = Tripart 6 2 , and the last term corresponds to the path 1— > A— >5^— 3— > 6 <— A <— 2. 

Let (p be the Lebesgue probability measure on the cube = [— 1,+1] 8 . The expression 
V\ 2|56(^) defined in (3) is the volume of the region of parameters a G that satisfy 



corr(l,2 | 5,6)| 



\J det(i^i34 j i34) ^ det(i^234,234) 



< A. 



Our 



As a function in A, the volume Vi 2 |56(A) is a cumulative distribution function on [0, oo) 
aim in this article is to determine the asymptotics of such a function for A — > 0. 

In Section 3 we shall explain the form of the asymptotics that is promised in (4). In order 
to find the exponents i and m, the first step is to run the algebraic algorithm in Section 4. 
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This answers the question whether the hypersurface in Q defined by / = has any singular 
points. The set of such points, known as the singular locus, is the zero set in Q of the ideal 

T , df df df df df df df df . 

dai3 aau aa23 C024 Ca35 ^036 oa^ dam 
The tools of Section 4 reveal that its real radical [15] is the intersection of three prime ideals: 

\fj = ( entries of 

V fl 23 a 24/ V fl 45 a 46/ ' 

/ \ r~, I \ r~, /Owo r ( a 13 a 23 a 45 «46 

= (ai3,ai4,a 2 3,a 2 4) n (a 35 , a 36 , a 45 , a 46 ) n (2x2-mmors of 

\Ol4 024 — «35 —^36 

Thus the hypersurface {/ = 0} is singular. Its singular locus decomposes into three irreducible 
varieties, namely two linear spaces of dimension 4 and one determinantal variety of dimension 
5. In Section 6 we return to this example, with focus on a statistical application of the 
cumulative distribution function Px,2|56 (^)> an d we will then show that (£, m) equals (1,1). □ 

This paper extends the work of Uhler, Raskutti, Buhlmann and Yu in [17] on the geometry 
of the strong-faithfulness assumption in the PC-algorithm. Upper and lower bounds on the 
volume Vg(X) of the unfaithful region Tubec(A) for the low- as well as the high-dimensional 
setting were derived in [17, §5]. These bounds involved only the number \E\ of parameters 
and the degrees of the correlation hypersurfaces {det(KmjF{) = 0}. The new insight in the 
current paper is that singularities are essential for the asymptotic behavior of Vg(A) for A — > 0. 

What led us to this insight was taking a closer look at the simulation results for trees. In 
[17, §6.1.1] trees were still treated as one single class. We subsequently examined the difference 




Figure 2: Proportion of A-strong-unfaithful distributions for chains compared to stars. 
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between stars and chains, depicted in Figure 1(a) and 1(b). Our simulation results for Star p 
and Chain p are shown in Figure 2. We shall now explain the curves in these diagrams. 

The left diagram in Figure 2 is for p = 6 nodes and the right diagram is for p = 10. 
Each curve is the graph of the cumulative distribution function Vg(X) but with the x-axis 
transformed to a logarithmic scale (with base 10). Thus we depict the graph of the function 

(-oo,0] -> [0,1], x ^ V G (10 X ). (6) 

The red curve is for G = Chain p and the blue curve is for G = Star p . These curves were 
computed by simulation: we sampled the parameter a from the uniform distribution on 
[— 1, l] p_1 and we recorded the proportion of trials that landed in TubeG(A) for various values 
of A. The diagrams show clearly that Vg(A) is smaller for star graphs than for chain graphs. 

A theoretical explanation for these experimental results will be given in Section 5. Our 
asymptotic theory predicts the behavior of these curves as x = log(A) tends to — oo. The point 
is that the correlation hypersurfaces for chain graphs have deeper singularities than those for 
star graphs. The equation of any such hypersurface for a tree is the product of a monomial 
and a strictly positive polynomial. This enables us to apply Proposition 3.5. In Theorem 5.1 
and Corollary 5.3 we shall determine the constants £, m and C of (4) exactly when the graph 
G is a tree. We shall also address the question of how to obtain I, m and C from simulations. 

Before we get to graphical models, however, we first need to develop the mathematics 
needed to analyze Vg(A). This will be done, in a self-contained manner, in the next section. 

3 Computing the volume of a tube 

We now introduce the basics regarding the computation of integrals like the one in (3), and we 
explain why asymptotic formulas like (4) can be expected. While this section is foundational 
for what is to follow, no reference to any statistical application is made until Theorem 3.8. It 
can be read from first principle and might be of independent interest to mathematicians. 

Let Vt C R d be a compact, full-dimensional, semianalytic subset and consider a probability 
measure ip{uj)dbj on Q where du is the standard Lebesgue measure and ip : O, — > R is a real- 
analytic function. Also, fix an analytic function / : — > M whose hypersurface { u : f(uj) = 0} 
has non-empty intersection with the interior of f2. We are interested in the volume V(\) with 
respect to the measure ip of the region 

Tube(A) = {uoeQ: \f(co)\ < A}. 

Here A > is a parameter that is assumed to be small. In later sections, we often take to be 
the cube [— l,+l] d , with ip its Lebesgue probability measure, and / is usually a polynomial. 

The asymptotics of the volume function V(X) depends on the singularities of the hyper- 
surface {/ = 0}. This phenomenon is illustrated in Figure 3. Our measure for the complexity 
of the singularities of / is a pair (I, m) of non-negative real numbers. That pair is the real log 
canonical threshold of /. It is related to the volume V(X) for small values of A by the formula 

V(X) « CA^-lnA)'"- 1 . (7) 
Here C is a positive real constant whose study we shall defer until Section 8. 
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(a) f(x, y) = x 




(b) /(as, y) = xy (c) f(x, y) = x 2 y 3 (d) f(x, y) = x 3 y - xy 3 

Figure 3: Tubes for various polynomials in two variables. 



Example 3.1. Let d = 2 and ip the Lebesgue probability measure on the square f2 = 
[— 1, +1] 2 . Our problem is to compute the area of the tube {(x,y) G : \ f(x,y)\ < A}. Here 
f(x, y) is one of the four simple polynomials below whose tubes are shown in Figure 3. 

(a) f(x, y) = x: The corresponding tube is a rectangle and its area equals 

V{\) = A. 

So, in this example, we have (£,m) = (1, 1) and C = 1. For other lines, the value of C 
will change. Proposition 3.6 below shows that (£,m) = (1, 1) for smooth hypersurfaces. 

(b) f(x, y) = xy: The tube in Figure 3(b) consists of four copies of a region that is the 
union of a small rectangle and a certain area under a hyperbola. Using calculus, we find 

V{\) = 4^A + J > ^dx S j \ = A(-lnA) + A. 

The logarithm function appears in this case. We have (£, m) = (1, 2) and C = 1. 

(c) f(x,y) = x 2 y 3 : The corresponding tube is shown in Figure 3(c). Its area equals 



V(X) = 4(A 1/2 



+ 



X^ 3 x~ 2 / 3 do 



AV2 



3A l/3_ 2A l/ 2> 



So, the real log canonical threshold equals (£,m) = (g,l), and we have C = 3. See 
Proposition 3.5 for a formula for (£, m) when / is a monomial in any number of variables. 

(d) f(x,y) = xy(x + y){x — y): The corresponding tube is shown in Figure 3(d). This 
example is a slight generalization of (b). As in (b) there is just one singularity at the 
origin and it is given by the intersection of lines. However, there is no simple closed 
formula for the area V(\), as this amounts to evaluating a non-trivial abelian integral. 
We shall see in Example 7.3 that this real log canonical threshold equals (£, m) = (^, 1). 

For general bivariate polynomials f(x,y) we are facing a hard calculus problem, namely 
integrating the function y = y(x) that is defined implicitly by f(x, y) = A. We can approach 
this by expanding y as a Puiseux series in A whose coefficients depend on x. Integrating these 
coefficients leads to asymptotic formulas in A. These are consistent with what is to follow. □ 
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We now return to the general setting defined at the beginning of this section. Let W be a 
random variable taking values in f2 with distribution <p. The volume V(A) with respect to the 
measure ip can then be viewed as the cumulative distribution function of the random variable 
f(W). The corresponding probability distribution function v(X) = dV/d\ is called the state 
density function. Its Mellin transform is known as the zeta function of /. It is denoted by 

C(z) = / X~ z v(X)dX = / \f(u)\-*(p(u)du for z £ C. 
Jo Jn 

According to asymptotic theory [1, 14, 18], our volume has the asymptotic series expansion 

d 

V(X) « ^ ^ C £ , m \ £ (-ln\) m -\ (8) 
i m=l 

Here the index £ runs over some arithmetic progression of positive rational numbers and d is 
the dimension of the parameter space f2. The equation (8) is valid for sufficiently small A > 0. 
To be precise, writing V(A) ~ YllZi 5«(^)> where 51(A) > 52(A) > • • • for small A, means that 

lim '~ 1 1 = for each positive integer k. (9) 

A^o g k (X) 

Using the little-o notation, this is equivalent to V(A) = Yli=i9i(^) + °(9k(ty) as A — ^ 
for each positive integer k. It is a common misconception to think that the infinite series 
converges to V(A) for each fixed A when A is small. Rather, it means that for each fixed k, 
the fc-term approximation for V(A) gets better as A — > 0. We will primarily be interested in 
the first term approximation (7). 

Definition 3.2 ([14, §4.1], [18, §7.1]). We here define the real log canonical threshold (£,m) 
of / over SI with respect to 92. This is a pair in Q + x Z + which we denote by RLCTq(/; ip). 
It measures the complexity of the singularities of the hypersurface defined by f(uj) = 0. 
The following four definitions of RLCTn(/; ip) = (I, m) are known to be equivalent: 

(i) For large N > 0, the Laplace integral 

Z(N)= [ e- N \ f ^y{uj)du 



Jn 

is asymptotically CN~ l (\aN) m ~ l for some constant C. 

(ii) The zeta function 

C(z)= [ |/( W )|->H4; 
Jn 

has its smallest pole at z = I and that pole has multiplicity m. 

(iii) For small A > 0, the volume function 



V{X) = / ip(uj)du 

■/|/(w)l<A 

is asymptotically C A (— lnA) m_1 for some constant C. 
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(iv) For small A > 0, the state density function 

d f* 

v(X) = — / ip(u) dui 

dX ^l/HI<A 

is asymptotically C\ f ^ 1 (— In A) m_1 for some constant C. 

If the real analytic hypersurface {u £ £2 : f{ui) = 0} is empty, we set I = oo and we leave 
m undefined. We order the pairs (£,m) reversely by the size of A^(— lnA) m_1 for sufficiently 
small A > 0. This means that (£%,mi) < (£2, m%) if l\ < £2 or if i\ = £2 and m\ > rri2. □ 

Let us provide some intuition for the ordering of the pairs (£, m) . The real log canonical 
threshold is a measure of complexity for singularities. Analytic varieties can be stratified into 
subsets where this measure is constant. The highest stratum contains the smooth points of 
the variety. As we go deeper, to strata with lower real log canonical thresholds, we encounter 
singularities of increasing complexity. The volumes of A-fattenings of deeper singularities will 
also be larger than that of their less complex counterparts. For instance, in Figures 3(b) and 
3(c) the singular locus of both examples consists of just the origin, but we observe that the 
A-fattening of the origin in Figure 3(c) is larger than in Figure 3(b). See also Example 3.7. 

Example 3.3. Let f(u) = uj\ + + • • • + and tp the Lebesgue probability measure on 
Q, = [—1, —l] d . Then Tube(A) is the standard ball of radius A 1 / 2 , whose 93-volume is 

7T d / 2 

V(X) = — — d \ d i\ 

v ' 2 d -r(f + i) 

By Definition 3.2 (c), here the real log canonical threshold equals RLCTq(/; ip) = (d/2, 1). □ 

We now list some formulas for computing the real log canonical threshold. A first useful 
fact is that RLCT^(/; p) is independent of the underlying measure ip as long as it is positive 
everywhere. We can thus assume that ip is the uniform distribution on f2. 

Proposition 3.4. If p : — >■ M is stricly positive and 1 denotes the constant unit function 
on tl, then 

RLCT n (f;p) = RLCT n (/;l). 
Proof. See [14, Lemma 3.8]. □ 

Proposition 3.5. Suppose that £1 is a neighborhood of the origin. If f(oj) = oj^ 1 ■ ■ ■ u}^ d g(uj) 
where g : f2 — )■ 1R does not have any real zeros, then RLCTq(/; 1) = (£,m) where 



min — and m 

i Ki 



1 

argmin 



ft"; 

Proof. This is a special case of Theorem 7.1 which will be proved later. □ 

Recall that an analytic hypersurface {/(w) = 0} is singular at a point lj E O, if ui satisfies 

df 

f(oj) = and — — (u) = for i = 1, . . . , d. 

OLOi 

If the hypersurface is not singular at any point uj £ Q, then it is said to be smooth. 
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Proposition 3.6. // the hypersurface {f{oo) = 0} is smooth then RLCTq(/; 1) = (1, 1). 

Proof. This is also a special case of Theorem 7.1. □ 

Example 3.7. Following up on Example 3.1, we now consider an arbitrary monomial function 
f(x, y) = x s y t on the square = [—1, l] 2 . The tube looks as in Figure 3(c). Its area satisfies 

( CX 1 ' 3 if s < t 

V(A) « I CX 1 / 1 if s > t 

{ CA^-lnA) if « = t 

This formula for the asymptotics (7) follows from Definition 3.2 and Proposition 3.5. □ 

For the statistical applications in this paper, the relevant functions / are polynomials. 
They are determinants / = det [K iRjR), where R = V\(S U as in Section 1. Let 

RLCT(i, j\S) denote the corresponding real log canonical threshold over J) = [-1,1] E with 
respect to a positive density ip. The theory developed so far says that the real log canonical 
threshold of the correlation hypersurface gives an asymptotic volume formula for Vij\s (A). 

Theorem 3.8. If if satisfies the assumptions in Proposition 3.4, then as X tends to zero, the 
volume of the region Tubejji<j(A) (see (2)) is asymptotically 

V i>j[s (X) « CA^-lnA)— 1 
for some constant C > (which only depends on G) and (£,m) = RLCT(i, j\S). 

Proof. By part (iii) in Definition 3.2, the desired pair (£, m) is the real log canonical threshold 
of the partial correlation / = corr(«, j\S). This is the algebraic (and hence analytic) function 
in (1). This function differs from the polynomial det(Km jr) by a denominator that does not 
vanish over 17. That denominator is a unit in the ring of real analytic functions over fi, and 
multiplying by a unit does not change the RLCT of an analytic function [14, §4.1]. □ 

We close this section by relating our results directly to the study of unfaithfulness in [17]. 

Corollary 3.9. Under the assumptions in Theorem 3.8, as X tends to zero, the volume of 
X-strong-unfaithful distributions satisfies 

Vg(A) w CA^-lnA)™" 1 

for some constant C > 0. Here (£,m) is the minimum of the pairs RLCT(z, j\S), where 
(i,j,S) runs over all triples in the DAG G such that i is not d- separated from j given S. 

Proof. The function Vg(X) is the volume of the union of the regions Tubej j\s(X). Thus, 

maxV^isCA) < V G (X) < ^V^sCA) 

1,3, S 

Asymptotically, for small positive values of A, both the lower and upper bounds vary like a 
constant multiple of A^(— InA)™ 1 " 1 where (£,m) is the minimum over all pairs RLCT(i, j\S). 
In this minimum, S) runs over all triples such that i and j are d-connected given S. □ 
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4 Singular Locus 

The asymptotic integration theory in Section 3 requires us to analyze the singular locus 
Sing(/) of the real algebraic hypersurface determined by a given polynomial /. If Sing(/) 
is empty then the hypersurface is smooth and Proposition 3.6 characterizes the asymptotics 
of the integral. In this section we return to Gaussian graphical models, we develop tools for 
computing the relevant singular loci, and we show that they are empty in many cases. In many 
of the remaining cases, the singularities are of the monomial type featured in Proposition 3.5. 

Consider any almost-maximal minor / = det(Km t jR) of the concentration matrix K of a 
DAG G. This is a polynomial function on the parameter space M. E . This polynomial and its 
partial derivatives are elements in the polynomial ring Q[ay : G E]. The Jacobian ideal 
of / is the ideal in this polynomial ring generated by / and its partials. We denote it by 

df 

Jacobs := (/) + ( — '- : € E). 

UCLij 

The singular locus Sing(/) is the subvariety of real affine space M. E defined by the Jacobian 
ideal Jacobjj^. The structure of the real variety Sing(/) governs the volume V it j\g{X) of the 
set Tubejj|s(A) of unfaithful parameters. If Sing(/) = then Proposition 3.6 tells us that 
Vjj|g(A) asymptotically equals CX for some constant C > 0. If the singular locus is not empty 
then understanding Sing(/) is essential for computing its real log canonical threshold (£,m). 

We conducted a comprehensive study of all DAGs with few nodes by computing the 
singular locus for every almost-principal minor in their concentration matrix K. Our first 
result concerns the special case of complete graphs. Non-complete graphs will be studied later. 

Theorem 4.1. Suppose that <p satisfies the assumptions in Proposition 3.4- For any condi- 
tional independence statement on the complete directed graph K p with p < 6 nodes, we have 
Sing(/) = 0, and hence V^ig^A) « CX for all triples S). 

It is tempting to conjecture that the hypothesis p < 6 can be removed in this theorem. 
Presently we do not know how to approach this problem other than by direct calculation. 

Applying Corollary 3.9, this means that the volume of A-strong-unfaithful distributions 
for the complete graph satisfies Vk p (X) « C A for A — )■ 0, which is the best possible behavior 
regarding strong-faithfulness. This may be counter-intuitive, but is confirmed in simulations. 
In Figure 4 we plot (via (6)) the proportion of strong-unfaithful distributions Vg(X) for the 
five graphs in Section 2 for varying values of A. Especially in the plot for p = 10 it becomes 
apparent that the behavior for A — > is very different than, say, for A = 0.001. For A — > 
we have V^ omp iete(A) < KhainlA), although the chain graph is much sparser than the complete 
graph. Note also that the complete graph K\q has X)jfe=2 Ck) (2) = H520 relevant triples 
(i,j,S), whereas for Chainio there are only Y^:=i k2 k ~ 1 = 4097 such triples. 

In what follows we explain the algebraic computations that led to Theorem 4.1. We 
used ideal-theoretic methods from [3] in their implementation in the Grobner-based software 
packages Macaulay 2 [5] and Singular [4]. An important point to note at the outset is that 
the ideal Jacobj,\R is almost never the unit ideal. By Hilbert's Nullstellensatz, this means 
that the hypersurfaces / = det(KmjR) do have plenty of singular points over the field C of 
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log(lambda) log(lambda) 

(a) p = 6 (b) p = 10 



Figure 4: Vfj(A) for the complete graph K p compared to Chain p , Star p , Tripart pi2 j Bow p . 

complex numbers. What Theorem 4.1 asserts is that, in many of the cases of interest to us 
here, none of those singular points have their coordinates in the field M of real numbers. 

In order to study the real variety of an ideal, techniques from real algebraic geometry are 
needed. A key technique is to identify sums of squares (SOS). Indeed, the real Nullstellensatz 
[15] states that the real variety is empty if and only if the given ideal contains a certain type 
of SOS. To apply this to directed Gaussian graphical models, we shall use the fact that every 
principal minor of the covariance matrix or the concentration matrix furnishes such an SOS. 

Lemma 4.2. Every principal minor det(K^Fi) of the concentration matrix K of a DAG is 
equal to 1 plus a sum of squares in Q[aij : (i,j) EE]. In particular, its real variety is empty. 

Proof. We can write the principal submatrix Kr^r as the product (A — I)r,* ■ ((A — I)r : *) t , 
where ( )r^ refers to the submatrix with row indices R. Thus Kr r is the product of an 
\R\ x p-matrix and its transpose. By the Cauchy-Binet formula, det(KR t R) equals the sum of 
squares of all maximal minors of the \R\ x p-matrix (A — One of these maximal minors 

is the identity matrix. Hence the polynomial det^KR^) has the form 1 + SOS. In particular, 
the matrix Kr^r is invertible for all parameter values in WL E . □ 

In the context of commutative algebra, it now makes sense to introduce the saturations 

Singu ijVR = (Jacobjj^ : det(K R ^R)°° ) , 
Sin g<j,R = ( Sin g u MVR : (n^es ^) 00 )- 

These are also ideals in Q[ay : (i,j) G E]. By definition, Singu^- ^ consists of all polynomials 
that get multiplied into the Jacobian ideal by some power of the determinant of Kr^r, and 
Singu*^ R consists of polynomials that get multiplied into Singu^^R by some monomial. By 
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[3, §4.4], the variety of SingUj j R is the Zariski closure of the set-theoretic difference of the 
variety of Jacob^jj and the hypersurface {det(Kji t ji) = 0}. We saw in Lemma 4.2 that the 
latter hypersurface has no real points. The ideal Singu*^ represents singularities in (IR^)*. 

Corollary 4.3. The singular locus of the real algebraic hypersurface {det(KiRjji) = 0} in M. E 
coincides with the set of real zeros of the ideal Singu^-^. The set of real zeros of Singu*^ ^ 
is the Zariski closure of the subset of all singular points whose coordinates are non-zero. 

Proof of Theorem 4-1- We computed the ideals Jacobjj^ and Singu^-^ for every almost- 
principal minor Km jR in the concentration matrices of the graphs G = K%, K4, K§, Kq. In all 
cases the ideal SingUjj ^ was found to equal the unit ideal ( 1 ). These exhaustive computations 
were carried out using the software Singular [4]. This establishes Theorem 4.1. □ 

We briefly discuss our computations for the complete directed graph on six nodes. 

Example 4.4. Fix the complete directed graph G = Kq. We tested all 240 conditional 
independence statements and computed the corresponding ideal Singu^-^. We discuss one 
interesting instance, namely i = l,j = 3, R = {2,4}. The almost-principal minor if 241,243 = 

( 2 2 2 2 N 

°23 + °24 + a 25 + a 26 + 1 025045 + 026046 — a 24 024034+025035+026036—023 

O25O45 + O26O46 — 024 a| 5 + af 6 + 1 035045 + 036 046 — 034 

013023 +014024+015025 +016026 — 012 015045 + 016 046 — Ol4 014034 + 015035 + 016036 , 

contains all 15 parameters except 056- Its determinant is a polynomial of degree 6. Of its 14 
partial derivatives, 13 have degree 5. The derivative with respect to 012 has degree 4. Thus 
Jacob 13 ,{2,4} is generated by 15 polynomials of degrees 4, 5, . . . , 5, 6. The matrix -^24,24 is the 
upper left 2x2-block in the matrix above. The square of its determinant is a polynomial of 
degree 8 that happens to lie in the ideal Jacobx ^{2,4} • This proves Singu lj3i { 2) 4} = (1). □ 

For graphs G that are not complete, SingUj^ may not be the unit ideal. We already 
saw one non-obvious instance of this for the tripartite graph in Example 2.1. Here is an even 
smaller example where the Jacobian ideal and its saturations are equal, and not the unit ideal. 



Example 4.5. Let p = 4 and take G to be the almost-complete graph with adjacency matrix 

A G = 



fO 013 ai4\ 

023 a 2 4 

034 

\0 / 



The conditional independence statement 1 _LL 2 | 4 is represented by the almost-principal minor 

7^ ( Og 4 + 1 024034 — 023 \ 

-"■31 32 = 

^014034 - Oi3 Oi 3 2 3 + Oi 4 a 2 4/ 

of the concentration matrix. The determinant of this minor factors into two binomial factors: 

det(if3 lj32 ) = (013034 + ai 4 )(a23034 + 024)- (10) 



14 



S. LIN, C. UHLER, B. STURMFELS, P. BUHLMANN 



The Jacobian ideal is the prime ideal generated by these factors: 

Jacobi ;2 ,3 = Singu 1)2)3 = Singu* >2 ,3 = ( a i3«34 + a u , ^23034 + 024 )• 

The left equality holds because det (1^3,3) = a| 4 + 1 is a non-zerodivisor modulo Jacobi^. 
The singular locus of (10) is the three-dimensional real variety defined by this binomial ideal 
in the parameter space M 5 . Its real log canonical threshold is found to be (£, m) = (1, 2). □ 

This example inspired us to analyze the partial correlations of all small DAGs with p < 4 
nodes. In our experiments, we found that det(Km t jji) is frequently the product of a monomial 
with a strictly positive sum of squares. Such cases are denoted as "Monomial" in Tables 1 and 
2. For these, the RLCT is read off directly from Proposition 3.5. The rows labeled "Smooth" 
cover cases that are not monomial but where Singu^j R is the unit ideal, so Proposition 3.6 
gives us the RLCT. The next theorem summarizes the complete results. The trivial case p = 2 
is excluded because there is only one graph 1 — > 2, with RLCT(1, 2|0) = (1, 1). 

Theorem 4.6. Under the assumptions in Theorem 3.8, for all DAGs with p < 4 nodes and 
all triples (i,j,S), the value RLCT(z, j\S) is given in Tables 1 and 2. In all cases but one, we 
have RLCT(i, j\S) = (l,m) where m < p. 

To establish Theorem 4.6 we listed every DAG G and every triple (i,j,S) that is not d- 
separated in G. The rows "Monomial" and "Smooth" were discussed above. The row "Normal 
crossing" refers to cases that are covered by Theorem 7.1. The "Special" cases are treated in 
Examples 4.5 and 4.8. Lastly, the row "Blowup" represents instances where the real singular 
locus is a linear space. Our computation of RLCT(i, j\S) = (£,m) for such instances uses the 
method explained in Example 7.4. We now examine the unique exceptional case where I ^ 1. 



Table 1: RLCT for all DAGs with three nodes 





(1,1) 


(1,2) Subtotal 




Monomial 


21 


3 24 




Smooth 


3 


3 




Subtotal 


24 


3 27 




Table 2: RLCT for all DAGs with four nodes 


(1,1) 


(1,2) 


(1,3) (1/2,1) 


Subtotal 


Monomial 568 


145 


14 1 


728 


Smooth 198 






198 


Normal crossing 


22 


2 


24 


Blowup 12 






12 


Special 2 


1 




3 



Subtotal 780 1 68 1 6 1 965 
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Example 4.7. Let p = 4 and G = Tripart 4 1 . Its concentration matrix may be obtained from 
Example 4.5 by setting a\± = 024 = 0. The partial correlation for 1 _LL 2 | 4 is now given by 

det(i^i 3i 23) = 013023034- 

For this monomial, Proposition 3.5 tells us that (£,m) = RLCT(1,2|4) = (1/2, 1). □ 

Here is an interesting case where the RLCT depends in a subtle way on the choice of O. 

Example 4.8. Consider the conditional independence statement 1 _LL 3 | 4 for the DAG in 
Figure 5. The partial correlation is represented by the almost-principal minor 

det(if 12,23) = 013 • g where g = 023024034 + 024 + 1. 

The component {g = 0} is smooth in IR 4 . However, it is disjoint from the cube SI = [—1, l] 4 . 
To see this, note that —1 < 023024034 in f2. With this, g = would imply 024 = and 
hence g = 1, a contradiction. Consequently, if Q is the cube [—1,1] then the correlation 
hypersurface is simply {013 = 0}, and the RLCT equals (1, 1) by Proposition 3.5. The other 
special case with RLCT = (1, 1) in Table 2 comes from swapping the labels of nodes 1 and 2. 

Now, if we enlarge the parameter space Q then the situation changes. For instance, 
suppose (013,023,024,034) = (0,-2,1,1) is in the interior of Q. This is a singular point of 
det(Ki2,23) = 013 • g. The RLCT can be computed by applying Theorem 7.1. It is now (1, 2) 
instead of (1, 1). This example shows that the asymptotics of Vy^A) depends on Q. □ 

Remark 4.9. We briefly return to the issue of faithfulness in the PC-algorithm. Zhang and 
Spirtes [20] introduced a variant known as the conservative PC-algorithm. It is less informative 
than the PC-algorithm, but only requires adjacency-faithfulness for correct inference. The 
adjacency-faithfulness condition is simply strong-faithfulness restricted to the edges of G: 

I corr(i, j\S)\ > A for all G E and S C V\{i,j} . 

If {i, j} is not adjacent to R then the relevant minor equals det(Kmj£i) = aijdet(Kji } R)+f(a), 
where / is a polynomial in a = {a s t \ (s,t) 7^ the correlation hypersurface is smooth, 

and (£, m) = (1, 1). If {i,j} is adjacent to R then the behavior can be wild, as in Example 4.8. 

5 Asymptotics for trees 

In [17] trees were treated as one class. However, as already noted when discussing Figure 2, 
there is a striking difference between the volume Vg*(A) for chain graphs compared to stars. In 
this section, we give an explanation for this difference based on real log canonical thresholds. 

i« mi 

iih— ^4 

Figure 5: 4-node DAG. 
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We use the notation SOS(a) for any polynomial that is a sum of squares of polynomials 
in the model parameters (a>ij)(i,j)eE- Suppose that G is a tree on V = {1, 2, . . . ,p} and let m 
be the longest length of an undirected path in G. It was shown in [17, Corollary 4.3(a)] that 
any non-zero almost-principal minor of the concentration matrix K has the form 

det(K iR>jR ) = (1 + SOS(a)) ■ a^, (11) 

where a^j is the monomial of degree < m formed by multiplying the parameters a rs along 
the unique path between i and j. Specifically, for the two trees in Figure 1 we have 



det(K iRtjR ) 



(1 + SOS(a)) Hill a k ,k+i if G = Chain p , 

(1 + SOS (a)) ■ a\ t ia\ t j if G = Star p , and i,j > 1. 



In both cases, the term SOS(a) disappears when i and j are leaves of the tree G; cf. (13), (14). 

Since the correlation hypersurfaces for trees are essentially given by monomials, we can 
apply Proposition 3.5. The minimal real log canonical threshold is (l,m) where m is the 
largest degree of any of the monomials in (11). Corollary 3.9 implies the following result: 

Theorem 5.1. Under the assumptions in Theorem 3.8, if G is a tree then the volume of 
X- strong-unfaithful distributions satisfies 

Vg(A) « CXi-lnX)™- 1 

where m is the length of the longest path in the tree G, and C is a suitable constant. 

For the case of stars we have m = 2, whereas for chain graphs we have m = p — 1. 

Corollary 5.2. Under the assumptions in Theorem 5.1, the volume Vq{X) of strong-unfaithful 
distributions satisfies 

V G (\) ~ I C'chain • A(-lnA) p ~ 2 if G = Chain p , 
\ Cstar • A(-lnA) if G = Star p . 

where C cria in and C s tar are suitable positive constants. 

As a consequence, the volume Vq(X) is asymptotically larger for chains compared to stars, 
and the difference increases with increasing number of nodes p. This furnishes an explanation 
for Figure 2, at least for small values of A. In that figure we saw the curve for the chain lying 
clearly above the curve for the star tree. However, one subtle issue is the size of the constants 
Cchain and C s t ar - These need to be understood in order to make accurate comparisons. 

In Section 8, we develop new theoretical results regarding the computation of the constant 
C in (7). Theorem 8.5 gives an integral representation for C when the partial correlation 
hypersurface is essentially defined by a monomial. In Example 8.7 we shall then derive: 

Corollary 5.3. The two constants in (12) are 

Cchain = 7 ^T. an d C s tar = ( „ 

{p-2)\ V 2 
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This result surprised us at first. It establishes the counterintuitive fact that, as p grows, 
the constant for the lower curve in Figure 2 is exponentially larger than that for the upper 
curve. Therefore, in order to fully explain the relative position of the two curves for a wider 
range of values of A > 0, it does not suffice to just consider the first order asymptotics (7). 
Instead, we need to consider some of the higher order terms in the series expansion (8). 

As we shall see in Section 8, it is difficult to determine the constants C^ m in (8) analytically 
In the remainder of this section, we propose a procedure based on simulation and linear 
regression for estimating the constants C^ m in the asymptotic explanations of the volumes 
Vg(X) or Vij\s(X). For simplicity we focus on the latter case and we take / = det(KiRjn). 

Suppose that G is a DAG for which the real log canonical thresholds (£, m) in Theorem 
3.8 and Corollary 3.9 are known. This is the case for all trees by Theorem 5.1. Our procedure 
goes as follows. We first sample n points uniformly from Vt and compute the proportion of 
points u that lie in Tubers (A) for different values of A. We then fit a linear model to 



whereas for star graphs, 



where (I, m) is the known real log canonical threshold. 

In the following, we illustrate this procedure for chain graphs and stars. We analyze two 
specific examples of partial correlation volumes, namely the ones corresponding to the longest 
paths in each graph, that is Vi^ p \%{\) for Chain p and ^2,310 (A) for Star p . For chain graphs, 

corr(l,p) = II?=i°i,i+i (13) 

1 + 4-i,p i 1 + a l-2, P -i (•••(! + afa) 

Co Q^ 012 ai3 

corr(2,3 = (14 

V(l + af 2 )(l + af 3 ) 

We first approximate (A) for chain graphs and ^310 (A) for star graphs by simulation 
for various values of A. This means that we sample n points uniformly in the parameter space 
Q and count how many of them are < A. The results for p = 6 and p = 10 are shown in 
Figure 6. These are based on a sample size of n = 1, 000, 000. We then fit a linear model 

« C p _ 2 (-lnAr 2 + C p _ 3 (-lnAr 3 + ... + Co 

for chain graphs. The curve resulting from the regression estimates is shown in black in Figure 
6. The curve resulting from the first-order approximation with the constants computed using 
Corollary 5.3 is shown in grey in Figure 6. We note that especially for chain graphs, where 
the true constant in Corollary 5.3 is small, the first order approximation is very bad. 

The approximation by regression on the other hand is a fast way to get pretty accurate 
estimates of all constants. The same was done with star graphs, but with the linear model 

« Ci(-lnA) + C . 

Figure 6 shows that the first-order approximation is more accurate for stars than for chains. 
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(b) p = 10 



Figure 6: Regression-based asymptotics for chain graphs and stars. 

6 Volume inequalities for bias reduction in causal inference 

We now discuss the problem of quantifying bias in causal models. Our point of departure is 
Greenland's paper [7]. In contrast to the previous sections, in the situation discussed here, a 
large tube volume is in fact desired since it corresponds to small bias. In this section we use 
the notation Kij\s f° r the almost-principal minor KmjR of the concentration matrix. 

We are interested in estimating the direct effect of an exposure fiona disease outcome D 
from the partial correlation corr(S, D\S), where S is a subset of measurable variables. This 
partial correlation is a weighted sum over all open paths (i.e paths which d-connect E to D) 
given S (the direct path cled being just one of them). For estimating the direct effect cled 
from corr(i?, D\S), all open paths other than the direct path are thus considered as bias. We 
shall analyze two forms of bias which are of particular interest in practice, namely confounding 
bias and collider-stratification bias. We start by defining collider-stratification bias. 

Suppose we are given a DAG G with D,E 6 V and there is another node C such that 

E -> Vi -> ► V s -> C <- Wi < <- W t 4- D. 

This says that C is a collider on a path from D to E. Stratifying (i.e. conditioning) on C 
opens a path between E and D leading to bias when estimating cied- The partial correlation 
corresponding to the opened path between E and D is known as collider- stratification bias. 

Example 6.1. We illustrate collider-stratification bias for the tripartite graph G = Tripart 51 
shown in Figure 7(a). Let node 1 represent the exposure E and node 2 the disease outcome D. 
In this example, node 5 is a collider C for multiple paths between E and D. When stratifying 
on C = 5, node E = 1 is d-connected to node D = 2 via the following paths: 

1^3^5^4^2, 1^4^5^3^2, 1^3^5^3^2, 1^4^5^4^2. 
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Figure 7: Various tripartite and almost tripartite graphs. 



The bias introduced for estimating the direct effect of E on D when conditioning on C is 

2 2 

COrr(l 2 | 5) — 013035045(124 T_ Q 14 Q 45 Q 35Q23 + a 13 Q 35»23 + Q14Q45Q24 /-^ 

A /det(iCi34 i i34) det (^234,234) 

The numerator det(/Ti34,234) is the weighted sum of all open paths between E and D. Simi- 
larly, nodes 3 and 4 are colliders for multiple paths. The bias when conditioning on these is 

013^23 + Ol4«24 



corr(l,2|34) = corr(l, 2 | 345) 



(16) 



n3 + a 14 + 1 )( a 23 + a 24 + 1 ) 

Problem 6.2 is about comparing the tube volume for (15) with the tube volume for (16). □ 

A question of practical interest in causal inference is to understand the situations in which 
stratifying on a collider leads to a particularly large bias. It is widely believed that collider- 
stratification bias tends to attenuate when it arises from more extended paths (see [2, 7]). 
What follows is our interpretation of this statement as a precise mathematical conjecture. 

Problem 6.2. Let D, E G V and C = {C £ V \ 3 path P from E to D with C as a collider}. 
We introduce a partial order on the collider set C by setting C < C if all paths on which C 
is a collider also go through C . Given subsets S, S' C C we set S < S' if for all C G S there 
exists C'gS" such that C < C . If this holds, then the bias introduced when conditioning on 
S should be smaller than when conditioning on S' . To make this precise, we conjecture: 

for all S < S' and all A G [0,1]. 



Vd,e\sW > Vd,e\s'W 



(17) 



We now study this conjecture for the tripartite graphs Tripart p y. Here, it says that the 
collider-stratification bias introduced when conditioning on the third level {p — p' + 1 , . . . , p} 
is in general smaller than when conditioning on the second level of nodes {3, . . . ,p — p'}, i.e. 



V, 



'l,2|p-p'+l,...,p(^) > ^1,2|3, 

This inequality is confirmed by the simulations shown in Figure 8(a). Here p = 5, p' = 2 is 
shown in red and p = 10, p' = 2 is shown in blue. The solid lines correspond to the volume 



> 



-p'(A). 



(18) 



l,2\p-p'+l,...,p 



(A), whereas the dashed lines correspond to the volume U 12 |; 



(A). 



Going beyond simulations, we now present an algebraic proof of our conjecture when A is 
small, for the tripartite graphs in Figure 7(c) where the second level has only one node. 
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(a) Complete tripartite graphs (b) Bow-ties 



Figure 8: Effect of collider-bias on complete tripartite graphs and bow-ties. 



Example 6.3. For G = Tripart p p _3 the left hand side of (18) is given by 

p 

det(if li2 |4,5,...p) = 013023(^0!^)- 

fc=4 

Depending on the values of p, the corresponding real log canonical threshold is given by 

f (i,l) if P = 4, 
(1,3) ifp = 5, (19) 



RLCT(l,2|4,...,p) 



k (l,2) ifp>6. 



For p = 4 this was Example 4.7. To prove (19) for p > 5, we need two ingredients. Firstly, if 
the polynomial is a product of factors with disjoint variables, then the RLCT is the minimum 
of the RLCT of the factors, taken with multiplicity (e.g. if the RLCTs are (I, mi) and (£, 7712), 
then the combined RLCT is (£, mi + 1712), just like in the case of a monomial). Secondly, the 
RLCT of a sum of squares of d unknowns is equal to (d/2, 1). We saw this in Example 3.3. 
For the right hand side of (18), we condition on node 3. Now, the defining polynomial is 

det(i^ lj2 |3) = ai 3 a 2 3- 

By Proposition 3.5, this has RLCT (1,2), which is larger or equal to all values of (£,m) in 
(19). To compare the behavior of ^^^(A) and 1^ 2 |4,. ..,p(A) f° r small A, we will need to derive 
the constant C in (7). In Example 8.8, we will show that if p > 6 and the parameter space is 

n = {a£ W 3 - 1 : |oi 2 | < 1, Ksl < 1, a 2 u + • • • + a\ p < 1}, (20) 

then the asymptotic constants are given by C 12 |3 = 1 and Ci^u..^ = 2 + 2/(p — 5). We 
conclude that V^^A) < Vi 5 2|4 v ... p (A) for small values of A, as conjectured in Problem 6.2. □ 
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Example 6.4. A slight twist to Example 6.3 is the almost- chain graph shown in Figure 7(d), 
with edges E = {(1,3), (2,3), (3,4), . . . (p— l,p)}. For such graphs, Problem 6.2 asks whether 

Vi )2 \sW<V 1)2lt (X) i£s<t. 
This holds for small A because det(K 12 | s ) = 013023 Ilfc=3 a 1 fc+i- By Proposition 3.5, 



RLCT(l,2|s) 



(1,2) if a = 3, 
(5, a -3) ifs>4, 



so RLCT(1, 2|s) > RLCT(1, 2\t) for s < t. □ 

In Example 6.3 we resolved Problem 6.2 for tripartite graphs whose middle level consists 
of one node. We next consider the case Tripartp ^ where the third level has one node. 

Example 6.5. The graph Tripart 5 1 shown in Figure 7(a) was discussed in Example 6.1. We 
focus on the numerators in (15) and in (16). The polynomial (15) will be studied in Example 
7.5 where we prove that RLCT(1, 2 | 5) = (1, 3). Using the same method for Tripart„ ^ gives 



RLCT(l,2|p) 



f if P = 4, 

(1,3) ifp = 5, RLCT(l,2|3,...,p- 1) 
[(1,2) ifp>6, 



(1,2) ifp = 4, 
(1,1) ifp>5. 



Thus, we conclude that U li2 | p (A) > Vx,2|3,...,p-iC^) for small values of A > 0. □ 

Example 6.6. For the graph Tripart 6 2 in Figure 7(b) we check if Vi^imity — ^i,2|34(-^) 
for small A. As before, RLCT(1, 2 | 3, 4) = (1, 1), but a hard computation using the tools of 
Section 7 reveals that now RLCT(1,2| 5,6) = (1,1). Thus, knowledge of the RLCT is not 
sufficient to establish (17). What is needed is a finer analysis along the lines of Section 8. □ 

The second form of bias studied by Greenland [7] is confounder bias. In the context of a 
directed graphical model G, a confounder for the effect of E on D is a node C such that 

E <- Vi < <- V s <- C -> Wi -> > W t -> D. 

The partial correlation introduced by the path from E to D passing through C is referred to 
as confounder bias. In such situations, stratifying on C blocks the path between E and D 
(i.e. C d-separates E from D) and therefore corresponds to bias removal. 

In certain graphs, such as the bow-tie example in [7] , there are variables where stratifying 
removes confounder bias but at the same time introduces collider-stratification bias. For 
instance, consider the graph G = B0W5, where node 4 corresponds to exposure E and node 5 
corresponds to disease outcome D. Then conditioning on node 3 blocks the paths 

4^3^ 5, 4^1^3^5, 4^3^2^5, (21) 

and therefore reduces confounder bias, but opens the path 

4^1^3^2^5 (22) 
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and therefore introduces collider-stratification bias. This trade-off is of particular interest in 
situations where one cannot condition on 1 and 2, for example because these variables were 
unmeasured. It is believed that in such examples the bias removed by conditioning on the 
confounders is larger than the collider-stratification bias introduced and one should therefore 
stratify. We translate this statement into the following mathematical problem: 

Problem 6.7. Let D,E G V and we denote by T> the confounder-collider subset, i.e. 

T> = C D {C £V \ 3 path tt from E to D having C as a confounder}. 
We conjecture the following inequality for the relevant tube volumes: 

Vd,e\sW > Vb,B|0(A), for all S C V and all A € [0, 1]. 

This conjectural inequality is interesting for the bow-tie graphs Bow p . It means that con- 
ditioning on the nodes in the second level reduces bias since the bias removed by conditioning 
on the confounders is larger than the collider-stratification bias introduced by conditioning: 

V p-i,p\3,...,p-2W > V p _ 1iP \q(\). (23) 

This is confirmed by our simulations in Figure 8(b), for p = 5 in red and p = 10 in blue. 
The solid line corresponds to the volume V^_ ljP | 3 p _ 2 (A) an d the dashed line corresponds to 
lp iP _i|0(A). In the following example we prove the inequality (23) for p = 5 and small A > 0. 

Example 6.8. Let G = B0W5 as in Figure 1(d). The left hand side (23) is represented by 

det(if 4)5 | 3 ) = 013014023025- 

This monomial is the path in (22). The corresponding real log canonical threshold is (1,4). 
The polynomial representing the right hand side (23) is a weighted sum of the paths in (21): 

det(K 4]5 |0) = 034035(1 + aj 3 + 033) + 023025^34 + 013014035- 

We derive its real log canonical threshold using the blowups described in Section 7. We find 
that it is (1,1). Since (1,4) < (1,1), we conclude V^s^A) > V^^A). □ 



7 Normal crossing and blowing up 

In this section we develop more refined techniques for computing real log canonical thresholds. 
The following theorem combines the monomial case of Proposition 3.5 with the smooth case of 
Proposition 3.6. As promised in Section 3, this furnishes the proofs for these two propositions. 

Theorem 7.1. Suppose <p(ui) = w^ 1 • • • ufj 1 and f(co) = ui^ 1 ■ ■ ■ u^ r g{oj) where k\, . . . , n r are 
non-zero and the hypersurface g{uj) = is smooth and normal crossing (see definition below) 
with U)i,...,u r . Let ojq denote the function g and let kq = 1, to = 0. Define 

■ Ti + 1 /r . n + i 

1 = mm , J = argmm , m=\J\, 
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where I is the set of all indices < i < r such that oji has a zero in Vt. Then we have 

RLCT n (/;^) = (*,m), 
provided the equations oji = for % E 3 have a solution in the interior of VI. 
The normal crossing hypothesis in Theorem 7.1 means that the system 

, df df df df 

f = = ••• = w r — = = ... = - — = 

does not have any solutions in fi. See [12] to learn more about normal crossing singularities. 
We begin with a technical lemma establishing that the RLCT can be computed locally. 

Lemma 7.2. For every there exists a neighborhood Vt x C O of x such that 

RLCTn x (f;<p) = RLCT[/(/; <p) 

for all neighborhoods U C 0, x of x. Moreover, 

RLCT n (f;ip) = minRLCTn^/;^) 

X 

where we take the minimum over all x in the real analytic hypersurface {u € f2 : /(w) = 0}. 

Proof. This comes from [14, Lemma 3.8, Proposition 3.9]. □ 

Proof of Theorem 7.1. Lemma 7.2 states that RLCTq(/; ip) is the minimum of RLCTq ;c (/; <p) 
as x varies over f2. Writing each subset fl x as R x H O where R x is a sufficiently small neigh- 
borhood of x in M. , we claim that RLCTn(/;y) = m i n zet2 RLCT j r :e (/; cp) if this minimum 
is attained in the interior of £1. Indeed, for x in the interior of O, we get RLCT^ ;r (/; cp) = 
RLCTj^/; ip). Otherwise, the volume of {w 6 ^ : f(w) < A} is less than that of {a; 6 R x : 
/(w) < A} for all A. Hence RLCTr x (/; tp) < RLCTq^ (/; cp), and the claim follows. 
Now to prove Theorem 7.1, it suffices to show that for each x G fi we have 



. n + i 

argmm 



where I x is the set of all indices < i < r that satisfies 0Ji(x) = 0. Without loss of generality, 
suppose x = (x\, . . . , Xd) where x\ = • • • = x s = and x s +i, ■ ■ ■ ,x r are nonzero. If g(x) ^ 0, 
we may divide f(u)) by g(oj) without changing the RLCT in a sufficiently small neighborhood 
R x of x. The RLCT of the remaining monomial is determined by [14, Proposition 3.7]. Now, 
let us suppose g(x) = 0. Because g{ui) is normal crossing with uj\, . . . ,uj r , one of the derivatives 
dg I dojj must be nonzero at x for some s + 1 < j < d. We assume R x is sufficiently small so 
that this derivative and co s +ij ■ ■ - ,oj r do not vanish. Consider the map a : R x — > M. d given by 

(Xj-(w) = g(u), cJi(w) = LOi for i ^ j. 
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The Jacobian matrix of a is nonsingular, so this map is an isomorphism onto its image. Set 
U = p(R x ) and p = a^ 1 : U — > R x . Then, for all p € U, we have 

(/ ° P)(JJ>) = Mi 1 • • ■ P-TPj ■ o,{fi) and (<p o p)(p) = p^ ■ ■ ■ p T s s ■ b(p), 

where the factors a{p) and b(p) do not vanish in U. By using the chain rule [14, Proposition 
4.6], we get RLCT^ a ,(/; ip) = RLCT[/(/ o p; ip o p). The latter RLCT can be computed once 
again by dividing out the nonvanishing factors and applying [14, Proposition 3.7]. □ 

The hypersurface {/(w) = 0} may not satisfy the hypothesis in Theorem 7.1. In that case, 
we can try to simplify its singularities via a change of variables p : U — )■ O. With some luck, 
the transformed hypersurface {(f ° p)(p) = 0} will be described locally by monomials and the 
RLCT can be computed using Theorem 7.1. More precisely, let U be a d-dimensional real 
analytic manifold and p : U — )■ Q a real analytic map that is proper, i.e. the preimage of any 
compact set is compact. Then p desingularizes f{uj) if it satisfies the following conditions: 

1. The map p is an isomorphism outside the variety {ui G £1 ■ /(w) = 0}. 

2. Given any y € U, there exists a local chart with coordinates pi, . . . , pd such that 

(/ o p){p) =tf... p K / ■ a(p), det dp(p) = p? • • • ■ b{p) 

where det<9p is the Jacobian determinant, the exponents «i,Tj are nonnegative integers 
and the real analytic functions a(p), b(p) do not vanish at y. 

If such a desingularization exists, then we may apply p to the volume function (7) to calculate 
the RLCT. Care must be taken to multiply the measure (p with the Jacobian determinant 
|det<9/)| in accordance with the change-of-variables formula for integrals. 

Hironaka's celebrated theorem on the resolution of singularities [9, 10] guarantees that such 
a desingularization exists for all real analytic functions f(ui). The proof employs transforma- 
tions known as blowups to simplify the singularities. We now describe the blowup p : U M rf 
of the origin in The manifold U can be covered by local charts Ui, . . . , Ud such that each 
chart is isomorphic to M rf and each restriction pi : Ui — > M. d is the monomial map 

(pi,...,Pi-i,£,p i+1 ,...,Pd) h-> {£pi, . . . ,£pi-i,i,£p i+ i, . . . ^pd). 

Here, the coordinate hypersurface £ = 0, also called the exceptional divisor, runs through all 
the charts. If the origin is locally the intersection of many smooth hypersurfaces with distinct 
tangent hyperplanes, then these hypersurfaces can be separated by blowing up the origin [9]. 

Example 7.3. Consider the curve {f(x,y) = xy{x + y){x — y) = 0} in Figure 3(d). To resolve 
this singularity, we blow up the origin. In the first chart, the map is p\ : (£, y±) h-» (£, £yi), so 

f ° Pi = £ 4 2/i(l + yi)(l -m) and det«9pi = £. 

The lines {y = 0}, {x + y = 0} and {x — y = 0} are transformed to {yi = 0}, {y\ = — 1} and 
{yi = 1} respectively in this chart, thereby separating them. The line {x = 0} does not show 
up here, but it appears as {x\ = 0} in the second chart, where p2 ■ (^i,£) i-» (£^i,£) and 

/ ° P2 = £ 4 ^i(xi + l)Oi - 1), det dp 2 = £. 
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Since the curve {(1 + yi)(l — yi) = 0} is normal crossing with £, y\ in the first chart, we can 
now apply Theorem 7.1. The chain rule [14, Proposition 4.6] shows that RLCT^(/; 1) is the 
minimum of RLCTtf. (/ o Pi - det dpi) for i = 1,2. In both charts, this RLCT equals (|, 1). □ 

Example 7.4. Let p = 4 and G be the almost complete DAG with ai3 = 0. We consider the 
conditional independence statement 1 _LL 3 | 4. The correlation hypersurface is defined by 

/ = det(Ki2,23) = aual 3 au + 014023024 + 012024034 - ai 2 o 2 3 + 014034. 

The real singular locus is a line in the parameter space M 5 , since SingU]^ 3 2 = (012, 014, 023, 034). 
Blowing up this line in M 5 creates four charts U\, U2, U3, U4. For instance, the first chart has 

pi : Ut^R 5 , (^,^14,^23,024,^34) H> (^,^14, C/i23, 024,^34), det dp 1 = f. 

Then / transforms to fopi = £ 2 -g where g = ^i4/i! 3 ^34£ 2 +m4M34+^i4^23024+^34024-^23- 
The hypersurface {g = 0} has no real singularities, so it is smooth in U\. We can thus apply 
Theorem 7.1 with 1 = {0, 1} to find RLCT^ (£ 2 • g, £ 3 ) = (1,1). The behavior is the same on 
U21U3 and Ui, and we conclude that RLCT(1,3|4) = (£,m) = (1,1). This example is one 
of the 12 cases that were labeled as "Blowup" in Table 2. The other 11 cases are similar. □ 

Example 7.5. In Example 6.5 we claimed that RCLT(1,2|5) = (1,3) for G = Tripart 51 . 
We now prove this claim by using the blowing up method. The polynomial in question is 

/ = det(if 1)2 | 5 ) = (013035 + ai4a 45 )(a23035 + 024045)- 
The singular locus of the hypersurface {/ = 0} is given by 

SingU! 2 34 = (035,045) n ( 2 x 2-minors of ( 013 023 ° 45 ) ). 

\ ai4 a 24 —035/ 

We blow up the linear subspace {035 = 045 = 0} in M 6 . This creates two charts. The map for 
the first chart is p\ : (013,014,023,024,^,^45) ^ (013,014,023,024,^,^45)- This map gives 

/ o pi = £ 2 (ai 3 + ai4/U45)(a23 + 024/M5), det dpi = 

Now, by setting 013 = x — 014^45 and 023 = y — 024^45, the transformed function / o pi is the 
monomial £, 2 xy. Then Theorem 7.1 can be employed to evaluate RLCT^ (£ 2 xy, £) = (1,3). 
The calculation in the second chart is completely analogous. □ 

The same approach as in Example 7.5 can be applied to the polynomial / = det(i^ 12 |56) m 
Example 2.1. A lengthy calculation, involving many charts and multiple blowups, eventually 
reveals that G = Tripartg 2 satisfies RCLT(1, 2 1 56) = (1, 1). This was stated in Example 6.6. 



8 Computing the constants 

We now describe a method for finding the constant C in the formula V(X) ~ C\~ e (— In A) m_1 
in (7). The two theorems in this section are new. They extend the results of Greenblatt [6] 
and Lasserre [13] on the volumes of sublevel sets. We begin by showing that the constant C 
is a function of the highest order term in the Laurent expansion of the zeta function of /. 
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Lemma 8.1. Given real analytic functions f, (p : O, — >■ M., consider the Laurent expansion of 

I p/ \\—z I \j a £,m Q>t,m—\ 
\t{OJ)\ (piujjduj = — ^- h -77, —: r H 

where I is the smallest pole and m its multiplicity. Then, asymptotically as A tends to zero, 



V(X):= f <p(u,)du> « ai ' m X e (-ln\r-\ 

J\f(u)\<\ i[m-l)\ 

Proof. According to the proof of [18, Theorem 7.1], the volume function V(A) equals Jq v(s)ds 
where v(s) = f n 5(s —f(ui)) ip(u)duj is the state density function and S is the delta function. 
Now, using the proof of [14, Theorem 3.16], we obtain 

v (s) = , a "' m „ s l -\-\ns) m - 1 + (/- 1 (-ln S r- 1 ) as s 0. 
ym — 1)! 

Here we used the little-o notation. Finally, using integration by parts, we find that 

V(X) = Tl a ^ m A^(-lnA) m - 1 + o(A £ (-lnA) m ~ 1 ) as A ^ 0. □ 
t[m — Iji 

Example 8.2. In Example 3.3, we saw that the volume of the d-dimensional ball defined by 
\uj\ + ■ ■ < A is equal to V(A) = C\~ d l 2 for some positive constant C. We here show how 

to compute that constant using asymptotic methods. By Lemma 8.1, C = 2a/(d2 d ) where a 
is the coefficient of (d/2 — z) m in the Laurent expansion of the zeta function 



C(z) 



/ \ojj + ...+ojj\~ z duj. 



Computing this Laurent coefficient from first principles is not easy. Instead, we derive a using 
the asymptotic theory of Laplace integrals. The connection between such integrals and volume 
functions was alluded to in Definition 3.2. By [14, Proposition 5.2], the Laplace integral 



Z(N) = / e -^( w ?+-+ w 3)dw 

JR d 

is asymptotically aT(^)N^ d ^ 2 for large N. But this Laplace integral also decomposes as 

Z{N) = [ e-^dwi • • • / e- Nj **du d = (V^N- l / 2 ) d , 
where each factor is the classical Gaussian integral. Solving for a leads to the formula 

c 



2 d.r(f)-| 2 rf •^(f + l)■ 

In Section 3, we saw how the RLCTs of smooth hypersurfaces and of hypersurfaces defined 
by monomial functions can be computed. The following two theorems and their accompanying 
examples demonstrate how the asymptotic constant C can also be evaluated in those instances. 
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Theorem 8.3. Let {/ = 0} be a smooth hypersurface and let tp : Q — )• M be positive. Suppose 
df/duji is nonvanishing in f2. Let W be the projection of the hypersurface {/ = 0} C Q onto 
the subspace {(uj 2: . . . , oJd) G and let p : S7 — > M d be the map u h-> (f(u),co 2 , ■ ■ • // 

£/ie boundary of £1 intersects transversally with the hypersurface {/ = 0}, then 



V(X) := / p{uj)du « CA 

./{wea|/(w)|<A} 

asymptotically (as A — )• where 

Proof The asymptotics of the volume V(A) depends only on the region |w G SI : |/(w)| < A}. 
So we may assume that Q is a small neighborhood of the hypersurface {f(oj) = 0}. As we 
saw in the proof of Theorem 7.1, the map p is an isomorphism onto its image. Thus after 
changing variables, the zeta function associated to V(A) becomes 



C(z) = / \f\ z ,/ .. op 1 (f,u 2 ,...,u d ) dfdw 2 ---du d 

Jp(U) \°MiJ\ 

/ ^^ Z Jf> — 7\° p ~ (/' w 2, • • • dfdu 2 ---duj d . 

Here, the lower and upper limits ei,£2 are nonzero because the boundary of fi is transversal 
to the hypersurface. By substituting the Taylor series 



... n °P 1 (f,U 2 ,...,Ud) = 7T°/° ^O,^, • • • + O(Z) 

and the exponential series e\~ z = 1 + 0(1 — z), we get the Laurent expansion 



£2|/r ^7i1 °P _1 (/,w2.---,^)d/ 



op (0,W2, ■ • • 



1-^ l^/l 

o p _1 (0,u;2, ... + 



+ 





1 y - -L, 



1-^ l^o,! /| 

The same is true for the integral from e\ to 0. The result now follows from Lemma 8.1. □ 

Example 8.4. By Theorem 4.1, all conditional independence statements in small complete 
graphs lead to smooth hyper surf aces. Here we analyze the statement 1 _LL 2 | 3 in the complete 
3-node DAG. This example was studied in [17, §2]. The corresponding partial correlation is 

corr(l, 2 | o) - 



y/l + a% 3 y/l + al 2 + al 3 

This partial correlation hypersurface lives in M 3 and it is depicted in [17, Figure 2(b)]. 

We apply Theorem 8.3 by setting Q := [— 1,1] 3 , / := corr(l,2|3) and ip := 1/2 3 , the 
uniform distribution on Q. We choose oj\ to be a\ 2 . Then p _1 (0, 013, 023) = (013023; 013, a 2 z)- 
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The projection W of the surface {a±2 = 013023} onto {(013,023) £ [— 1>1] 2 } is the whole 
square [— 1, l] 2 . The formula for the constant C now simplifies to 



C = J yjl + of g^l + af 3 + afgafjg dai 3 da 23 ~ 5.4829790759. 

This two-dimensional integral was evaluated numerically using Mathematica. □ 

We now come to the monomial case that was discussed in Theorem 7.1. 

Theorem 8.5. Let f(w) = ool 1 ■ ■ ■ oj^ d g(uj) where g : f2 — > M has no real zeros. Let ip : f2 — > R 
be positive. Suppose that l/£ = k\ = ■ ■ ■ = K m > K m +i > • • • > Kd and that the boundary of£l 
is transversal to the subspace L defined by uj\ = ■ ■ ■ = uj m = 0. Let ui and R denote the vectors 
(u) m+ i, . . . , cod) and (n m+ i, . . . , Kd) respectively. Then 

V(X):= [ <p(oj)du w CA^-lnA)™" 1 

asymptotically as A tends to zero where 

C = J 2ir ,M I "~ m 9(0, • • • , 0, a;)" V(0, • • • , 0, u>)dQ. (24) 
l[m- V)\ J nnL 

Proof. Let us suppose for now that Q is the hypercube [0,e] d . Because f(ui) is nondegenerate 
with respect to its Newton polyhedron [14, §4.2.1], we may apply the last formula in the proof 
of [14, Theorem 5.11], with pjr equal to the identity map. From this we obtain the desired 
Laurent coefficient m of the zeta function, namely 



m 

TT - / <D^<?(0, • • • , 0, Cd)~ V(0, . • • , 0, uj)duj. 

f = 1 K i J[0,e]d-m 



a £,m — 

i=l 

More generally, when the boundary of is transversal to the subspace L, we decompose 
into small neighborhoods which are isomorphic to orthants. Summing up over these orthants 
and applying Lemma 8.1 gives the desired result. □ 

Remark 8.6. We revisit the planar tubes shown in Figures 3(a)-3(c). Using the formula (24) 
in Theorem 8.5, one can easily check the constants C we saw in Example 3.1, namely 

{1 for f(x,y) = x, 
1 for f(x, y) = xy, 
3 for f(x,y) = x 2 y 3 . 

Example 8.7. We apply Theorem 8.5 to find the constants in Corollary 5.3 for chains and 
stars. In both cases we set Q = [—1, l]^ 1 and ip = 2 1_p . For chains we have (£, m) = (l,p— 1) 
and L is the subspace 012 = • • • = « p -i,p = 0. Then the integral in (24) is the evaluation of 
the denominator of (13) at the origin multiplied by p, so C c h a in = V(p — 2)! as claimed. 
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For stars we have (£,m) = (1,2) and one can check that corr(i,j) > coir(i, j\S) for all 
triples i,j,S. Hence, Vg(A) is the volume of the union of the tubes {| corr(i,j)| < A} where 
1 < i < j. By applying formula (24) to the function (14), the asymptotic volume of each tube 
computes to A(— In A). Meanwhile, as A — > 0, the volumes of the intersections of these tubes 
become negligible. Therefore C s t ar equals ( p ^ 1 ) , the number of pairs 1 < i < j. □ 

Example 8.8. We compute the constant C of the volume Vi,2[4,...,p(A) for G = Tripart pp _ 3 
as in Example 6.3. Let p > 6 and O the parameter space (20). By symmetry, it suffices to 
study the intersection °f ^ with the positive orthant. Let g(a) = YX=4 a 3k- To a PPly 
Lemma 8.1, we need to derive the coefficient of (1 — z) 2 in the Laurent expansion of 

((z) = J (a 13 a 2 3 j~ j da, where h(a) = ^l + g(a)(aj 3 + 1) • yjl + p(a)(a| 3 + 1). 

The Taylor series expansion of the integrand about ai 3 = 023 = is 



9( a ) 1 / s 
013^23 ) = {ai 3 a 2 3) 



l + g(a) 



+ 0(013) + 0(a 23 ) 



The higher order terms in the Taylor expansion contribute larger poles to the zeta function 
C(z), so we only need to study the Laurent series 

t \-z( 9(d) X~ z , , 1 f ( g(a) \~ 

(013023) ^— — j-r dai 3 da 2 3da = ^ / — — — da + • • • 

By using the Lebesgue probability measure on the ball {g(a) < 1} and substituting spherical 
coordinates for the integration, we find that the above coefficient of (1 — z) 2 equals 

1 , 2 
1+ / ~^da = 2 + 



'{g(a)<l} 9(a) p-5' 
This is the constant Ci 214,. p needed for the bias reduction analysis in Example 6.3. □ 
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